Bioinformatics (Thomas Dandekar, Meik Kunz)

146

surprising that for all the systems biology properties discussed in Sect. 5.1, we can now

tell through concrete data analysis which molecules are involved in the individual feed

back loops, in signalling cascades and in the individual building units (modules).

But larger contours are also becoming visible. For example, the importance of RNA as

an important level of cell regulation had previously been underestimated, as has only been

fully realised in recent years with the discovery of lncRNAs (long non-coding RNAs) and

miRNAs (microRNAs) in higher cells and sRNAs (small RNAs) in bacteria. For example,

an important lncRNA inactivates the second X chromosome in females (xist RNA) and is

therefore involved in this fundamental difference between males and females. In contrast,

miRNA-21 stops phosphatases such as PTEN and stimulates tumor growth, thus being an

important tumor marker. For understanding this new level of cellular regulation, integra

tive bioinformatic analysis of the transcriptome (and its interplay with other omics

domains) is a crucial prerequisite (e.g. two of our papers Fuchs et al. 2020 and Stojanović

et al. 2020 showing a link of RNA and proteome to miRNA regulation in cardiac and pul

monary fibrosis).

A second example for a deeper understanding of the design principles of our cells is

tissue replacement by artificial tissue or stem cells. Here, bioinformatics is essential to

uncover signaling pathways and generate suitable tissue or reprogram stem cells.

Another current application of the cell’s design principles is protein design: bioinfor

matics and experiments that systematically change protein structures to investigate how a

protein acquires new properties. This now works well enough with the large number of

protein structures (e.g. 3D coordinates from the PDB database) that this is being used

more and more actively. First of all, the protein structure has to be predicted. This can be

done particularly well using a template (protein with a known structure; “homology mod

elling”), for example using the SWISS-MODEL software (Waterhouse et al. 2018). All

known structural domains in a protein can be found with AnDOM (3D domain annota

tion). If there is insufficient (approximately 62% same/similar amino acids) similarity to a

known protein structure, one can determine the best matching structure by threading the

sequence on all known structures (“threading”; e.g., server I-TASSER; Zheng et al. 2019a)

or LOMETS (Zheng et al. 2019b), or by protein folding simulations (“ab-initio”; e.g.,

QUARK server; Zheng et al. 2019a).

This is followed by the design step: for about three decades, ligands and pharmaceuti

cals have been optimized to better fit the protein structure, e.g. the receptor. Drugs against

HIV infection have often been achieved by design. More recently, one actively incorpo

rates protein structures into simulations and predictions, using high-throughput experi

mental methods (Lam et al. 2018; Dominguez et al. 2017), and also understands catalysis

in enzymes or receptor function better and better (Mahalapbutr et al. 2020; Sgrignani et al.

2020). However, protein structure can also be used to selectively alter protein structure

itself, for example to improve enzyme activities (Leman et al. 2020; Rosetta software) and

to systematically change protein building units, even to combine them into logic circuits

(Chen et al. 2020), where it is now easy to add or swap secondary structure in particular.

11 Design Principles of a Cell